Reflection without Remorse 



Revealing a hidden sequence to speed up monadic reflection 

Atze van der Ploeg Oleg Kiselyov 

Centrum Wiskunde & Informatica University of Tsukuba 

ploeg@cwi.nl oleg@okmij.org 



Abstract 

A series of list appends or monadic binds for many monads per- 
forms algorithmically worse when left-associated. Continuation- 
passing style (CPS) is well-known to cure this severe dependence 
of performance on the association pattern. The advantage of CPS 
dwindles or disappears if we have to examine or modify the inter- 
mediate result of a series of appends or binds, before continuing 
the series. Such examination is frequently needed, for example, to 
control search in non-determinism monads. 

We present an alternative approach that is just as general as CPS 
but more robust: it makes series of binds and other such opera- 
tions efficient regardless of the association pattern - and also pro- 
vides efficient access to intermediate results. The key is to represent 
such a conceptual sequence as an efficient sequence data structure. 
Efficient sequence data structures from the literature are homoge- 
neous and cannot be applied as they are in a type-safe way to series 
of monadic binds. We generalize them to type aligned sequences 
and show how to construct their (assuredly order-preserving) im- 
plementations. We demonstrate that our solution solves previously 
undocumented, severe performance problems in iteratees, LogicT 
transformers, free monads and extensible effects. 

Categories and Subject Descriptors D.3.2 [Programming Lan- 
guages ] : Language Classifications - Applicative (functional) lan- 
guages 

Keywords performance; monads; reflection; data structures 

1. Introduction 

It is well-known that list-concatenation (-H-) is not efficient when 
its left argument is itself the result of a concatenation. A popu- 
lar solution to this problem is to use continuation-passing style 
in the form of difference lists. We recall the problems of list- 
concatenation and how continuation-passing style remedies it in 
Sections 2 and 3 respectively. However, continuation-passing style 
only solves the performance problem for certain usage patterns: if 
we need to observe intermediate results of concatenations, or build 
concatenations with sub-lists of other concatenations, then perfor- 
mance quickly degenerates. In other words: continuation-passing 
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style again leads to performance problems if we alternate between 
building and observing. 

In this paper, we show that this pattern also occurs in many other 
situations, which at first blush have nothing to do with lists. In many 
implementations of monads (e.g., iteratees and non-determinism 
monads), a series of binds (3>=) or choices (mplus), is quite like 
a series of list appends: they perform badly when left-associated. 
Like with lists, continuation-passing style makes such series per- 
form algorithmically well regardless of the association pattern [24]. 
However, several monads also support monadic reflection [6], 
a way to observe and modify (a representation of) the current 
state of the computation. For example, the current state of a non- 
deterministic computation may be observed as a stream of results. 
We may remove the top result and continue with the rest - which 
is exactly what is needed to implement committed choice [16]. 
Such monadic reflection destroys the performance advantage of the 
continuation-passing style. This paper shows that one does not have 
to regret reflection. 

For lists, the solution to the append-and-observe problem is to 
use a more suited sequence data structure, i.e. one that supports 
both head/tail and append operations efficiently. Such data struc- 
tures can give an asymptotic improvement over both regular lists 
and difference lists. The surprise of this paper is that such efficient 
data structures can also give an asymptotic improvement for other 
problematic occurrences of the build-and-observe pattern, in partic- 
ular, monads and monadic reflection. The key insight is that we can 
reveal the hidden, abstract sequence of monadic binds: we can rep- 
resent is as a concrete sequence. By then choosing the most suited 
sequence data structure for the problem at hand, performance can 
be greatly improved. 

However, the literature on efficient sequences deals with homo- 
geneous collections. In a 'sequence' of binds, the types of the 'el- 
ements' may vary. To solve this problem, we introduce a general- 
ization of sequences called type aligned sequences: heterogeneous 
sequences where the types enforce the element order. In this way, 
we can solve the performance problem in any situation exhibiting 
the problematic pattern, in a completely type-safe way. 

We were confronted with the performance problems of monadic 
reflection in projects using monadic functional reactive program- 
ming [23] and the parallel composition of iteratees [15]. These 
practical problems have motivated the present research. We have 
distilled the issue into a performance problem with simple tree 
substitutions, which helped us see how changing the data repre- 
sentation to use efficient sequences can improve performance. This 
not only solves the original problem, but also gives a drop-in re- 
placement for free monads [22] with better performance character- 
istics than previous approaches: examining a free monad value and 
binding it are both efficient, letting us alternate between these op- 
erations without performance penalty. This improved free monad 
leads, among other things, to an implementation of extensible ef- 
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fects [17] in which a wider range of effects can be modeled effi- 
ciently. 

We begin with some background: Section 2 recalls the prob- 
lematic build-and-observe pattern in several guises, and we discuss 
continuation passing style and its performance problems in Section 
3. Then we present our contributions: 

• We present a solution to the build-and-observe problem for any 
monoid where left-associated expressions are more costly than 
right-associated expressions, giving an asymptotic running time 
improvement over both direct and continuation-passing style. 
(Section 4) 

• We generalize our solution for monoids to monads, making 
left-associated bind expressions as well as monadic reflection 
efficient. (Section 4) 

• We introduce type aligned sequences. As an example, we show 
an implementation of efficient type aligned queues. (Section 5) 

• We show how our method solves previously undocumented, se- 
vere performance problems with monadic reflection in iteratees, 
LogicT transformers, free monads and extensible effects. (Sec- 
tion 6) 

And in Section 7 we conclude. 

The code accompanying this paper is available at: 
https : //github . com/ atzeus/ref lectionwithoutremorse 
The code in this paper is in Haskell, but our approach can be used 
in any language with GADTs (indexed data types). 

2. The problematic pattern and its cost 

In this background section we recall the performance problems of 
associative operators that traverse their left argument but not their 
right argument. In particular, we discuss list concatenation, tree 
substitution and generic tree substitution. We recall that the running 
time cost of equivalent expressions involving such operators can 
differ asymptotically. 

2.1 A first example: list concatenation 

To analyze the performance problems of list concatenation, we 
recall the relevant standard definitions: 

data [a] = [] a : [a] 

[] -H- r = r 

(h : t) -ff r = h : t 4f r 

To append two lists, we must traverse all elements of the first list to 
arrive at the empty constructor at the end. Hence, reducing x -H- y to 
normal form requires |x| + 1 case distinctions, from now on called 
steps, where jxj is the length of x. 

One might argue that this is not a problem: thanks to laziness, 
observing the head of x-ff y is just observing the head of x, plus one 
extra step. To observe the n-th element of a list we must traverse the 
list anyway: concatenation just adds one extra step per element. 

The real problem arises if the left argument is itself the result 
of a concatenation. For example, in the expression (x -ff y) -ff z, 
the list x must be traversed twice: it occurs twice in a left hand side 
argument to 4f . Hence, this expression runs in in 2|x| + |y| + 2 
steps, whereas the equivalent expression x -H- (y 4f z) runs in just 
x + |y| + 2 steps. In this way, a wrong grouping of expressions 
involving -ff can easily lead to severe performance problems, as we 
shall see in full generality in §2.4. 

2.2 Another example: Tree substitution 

A different guise of the same problem occurs with trees and an 
operation which substitutes the leaves of a tree with another tree: 



data Tree = Node Tree Tree 
| Leaf 

(<-^) :: Tree -> Tree — > Tree 
Leaf ^ y = y 

(Node I r) <J y = Node ( I y) ( r y) 

The performance situation is similar: evaluating (x <M y) z 
traverses x twice, whereas the equivalent x (y <M z) only 
traverses x once. Hence evaluating the former expression costs |x| 
steps more than evaluating the latter, where |x| is now the number 
of inner nodes in x. 

For lists, this problem can be solved by simply using a catenable 
(meaning with fast concatenation) sequence data structure instead 
of a regular head-tail list. For trees, the solution is not so obvious. 
Should we investigate a new specialized data structure for trees or 
browse the literature to see if someone else has already invented it? 
(Hint: No.) 

2.3 A Monadic example: Generic trees 

The performance degradation from a bad association occurs not 
only with monoids, such as lists and trees. If we generalize our tree 
to a generic tree, with data at the leaves, then substitution becomes 
the monadic bind 0-=)' : 

data Tree a = Node (Tree a) (Tree a) 
| Leaf a 

(■<—>) Tree a — > (a — > Tree b) — > Tree b 

(Leaf x) ef = f x 

(Node I r) <-= f = Node (I <-= f) (r ^ f) 

instance Monad Tree where 
return = Leaf 

(»=) = («-) 

The performance situation is obviously the same: the only thing 
that changed is that 4— > now takes a function as its right argument. 
Although and 3>= are not associative operators in the strict 
sense, they satisfy the similar associativity monad law: 

(m >= f) >= g = m >= (Ax^ f x >= g) 

We now see that the situation is the same: (m >•= f) ^>= g runs in 
|m| steps more than the equivalent m >■= (A x — > f x 3>= g). 

Note that while bind is not strictly an associative operator, the 
following operator, known as Kleisli composition, is strictly an 
associative operator: 

(;§>) :: Monad m =>• (a — > m b) — > (b — > m c) — > (a — > m c) 
f»g = Ax->fx»=g 

The similarity with the situation with lists and non-generic trees 
can then be made even stronger: (p 2^ q) 2^ r is more costly 
than the equivalent p > (q > r). 

2.4 Asymptotic running time overhead 

In general, the problem occurs with any associative (or satisfying 
the associativity monad law) operator (©) that traverses its left ar- 
gument but not its right argument that operates on some recursive 2 
data type. In this situation, (x © y) © z costs |x| more steps to eval- 
uate than x © (y © z), where |x| is now the number of values of type 
X inside x that are non-terminal (i.e. they are not for example the 
empty list or a leaf). 

Repeated application of such an operator can lead to asymptotic 
running time overhead if |affib| > a +|b|. For lists, this obviously 

1 This example is taken from [24]. 

2 If the data type is not recursive, e.g., the Maybe monad, one can easily see 
that both left and right associations have the same asymptotic cost. 
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(a) A left-associated expression 



(b) A right-associated expression 



Figure 1: Equivalent left- and right-associated expressions. 

holds since a 4f b = |a| + |b|. For trees, the size of a ^ b is 
a I + a / 1 b I , where a; is the number of leaves in the tree a. Since 
there is at least one leaf in a tree, the inequality |a b| > a | + | b 
holds. 

That this leads to asympotic running time overhead can be seen 
as follows: a left-associated expression, as visualized in Figure 
1(a): 

(((ai e a 2 ) e 3 3 ) • • • e a„) e a„ +i 

then costs at least Yli"Zi{ n ~ ')l a 'l more ste P s man tne equivalent 
right-associated expression, visualized in Figure 1(b): 

si e (s 2 e (33 e ... (a„ e a n+1 ) ... )) 

If we assume that all elements have size one, i.e. |a, | = 1, then we 
more easily see that a left-associated expression costs 0(n 2 ) more 
steps than a right-associated expression: 



£<"-'") = £' 



"("-!) 
2 



Of course, these are the most extreme cases: most expres- 
sions will not be completely right- or left-associated. However, 
any expression that is not completely right-associated will yield an 
overhead. We cannot expect the programmer to only form right- 
associated expressions, especially when using laziness: the pro- 
grammer must then make sure that every time the operator is used, 
the left hand side cannot be itself a result of this operator. 

3. A popular partial solution: 
Continuation-passing style 

In this second background section, we discuss a popular way to 
alleviate such performance problems for certain usage patterns, 
namely continuation-passing style. We illustrate this technique 
with difference lists, which use continuation-passing style to speed 
up list concatenation. We then show that difference lists only avoid 
performance problems if we do not alternate between building and 
observing and that the same holds for continuation-passing style in 
general. 

3.1 Difference lists 

The trick of difference lists [10] is to only build right-associated ex- 
pressions. More precisely, difference lists are functions for building 
right-associated expressions, i.e. functions of the form: 

At -s- ai -H- (a 2 -tt- (33 4f (34 -H- ... 4f t))) 
And hence we define difference lists as functions from lists to lists: 



Figure 2: Difference list with worst case conversion characteristics. 



type DiffList a = [a] -> [a] 

We can convert a difference list to a regular list by simply feeding 
it the empty list: 

abs :: DiffList a -> [a] 
abs a = a [] 

To convert a list to a difference list, we partially apply 4f: 



rep 
rep 



DiffList a 



(*) 



Concatenation is then simply function composition, since (a -++-) o 
(b4f) = At -»• a 4f (b -H- 1) 3 : 

(-H-) :: DiffList a -> DiffList a -» DiffList a 

(+) = (o) 

The trick is then to concatenate using difference lists, and then 
convert the result to a list when needed. Since this will always 
produce a right-associated expression, the overhead associated with 
expressions that are not right-associated is avoided. 

However, the problem with this technique is that converting a 
list to a difference list is expensive in the long run. Conversion of a 
list I to a difference list is simply (I -H-), which, when the final result 
is observed, contributes the costs of I steps, adding one operation 
to each node in the list. Hence, if we convert back and forth n times, 
this will cost n\ 1 1 steps. Of course, converting the same list back and 
forth a number of times is a bit of a contrived situation. However, 
the problem also occurs if we convert a difference list to a list and 
convert part of the list back to a difference list. 

Another, more subtle problem is that conversion in the other 
direction, from a difference list to a list, is not a constant time 
operation. We cannot observe anything directly on a difference 
list, for example we cannot see whether it is empty, and hence 
conversion to a regular list is often required. This conversion is 
not cheap: in the worst case the difference list consists of a left- 
associated expression of the following form, which is visualized in 
Figure 2: 

((((ai-H-) o (32-H-)) o (a 3 -H-)) ... -H- (a n _i-H-)) o (a n -H-) 

Converting such a difference list to list, by applying [] to it, then re- 
quires n invocations of o to reduce to the following list expression: 

3 0 -H- (si 4f (s 2 -H- (33 -H- ... -tt- (a„ -H- [])))) 

Only after these operations we can reduce further and inspect the 
resulting list to see whether it is empty or not. Hence, observing 
(parts of) intermediate lists can also lead to performance problems. 



! We use the notation (x 4f) as a shorthand for (Ay — > x 4f y). 
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To summarize: difference lists only solve performance problems 
if our usage of lists is strictly separated into a build (i.e. concate- 
nation) phase and an observation phase. If we alternate between 
building and observing, as is often needed, then performance prob- 
lems will resurface. 

3.2 General Continuation-passing style 

The trick of difference lists, i.e. continuation-passing style, can be 
applied in many situations. For example, it can be applied to any 
monoid 4 : 

type DiffMonoid a = a — > a 

abs :: Monoid a => DiffMonoid a — > a 

abs a = a mzero 

rep :: Monoid a => a — > DiffMonoid a 
rep = mappend 

instance Monoid a => Monoid (DiffMonoid a) where 
mempty = id 
mappend = (o) 

If we apply the trick to monads, we get the codensity monad trans- 
former [11], which is highly related to the continuation monad [18]: 

type CodensityT ma=Vb. (a^mb)^mb 
abs :: Monad m => CodensityT m a — > m a 
abs a = a return 

rep :: Monad m => m a — > CodensityT m a 
rep = (»=) 

instance Monad m => Monad (CodensityT m) where 
return a = rep (return a) 

or equivalently : Ak — > k a 

m ^>= f = m o flip f 

or equivalently: A k — > m (Aa — > f a k) 

The codensity monad transformer is often used for solving the 
performance problems of left-associated expressions [4, 24]. As 
with difference lists, this works fine if our usage is separated in 
a build and an observations phase. However, if we have another 
usage pattern, alternating between building and observing, the same 
problems as with difference lists occurs: continuation-passing style 
reintroduces performance problems. 

4. Solving the problem 

The main insight for our solution is that expressions of the form: 

a 0 0 a\ © a 2 © • • • © a„ 

are sequences and that such abstract sequences should be repre- 
sented explicitly. With the previous approaches such sequences are 
only represented implicitly. More precisely, when directly using ©, 
these sequences are implicitly represented at runtime as trees where 
the leaves are the elements and nodes are (delayed) function ap- 
plications. When using continuation-passing style, such sequences 
are also represented as trees, but now the leaves are functions rep- 
resenting the elements and the nodes are function composition. By 
making representation of these sequences explicit, we can choose a 
more suited sequence data structure and performance problems can 
be solved for any usage pattern. 

We first illustrate our solution by applying it to tree substitution. 
We then show that applying our solution to generic trees requires 
type aligned sequences and how such type aligned sequences can 
be used to solve the problem. Afterwards, we discuss the general 
solution. 

4.1 A first example: tree substitution 

We want to replace the implementation of the Tree data type and 
the substitution operator such that they have the same semantics, 

4 To reduce clutter, we ignore the fact that DiffMonoid and CodensityT 
should actually be a newtype in Haskell. 



but better performance characteristics. Hence we will redefine the 
following operations: 

• Observing a tree, i.e. viewing if it is a leaf or node. 

• Constructing a leaf or node. 

• The leaf substitution operator. 

We are not concerned with other operations on trees here, they are 
defined in terms of the above operations. 

Before we define our new data type Tree', let us start with 
defining what the result of observing a tree should be. Analogous 
to viewing a sequence data structure from the left or right, we can 
view a tree by observing if its root node is a leaf or a node: 

data TreeView = Node Tree' Tree' 
| Leaf 

Notice that the children of a Node are not of type TreeView, they 
are of the new (yet to be defined) Tree' type. To pattern match on 
a value of type Tree', we first need to call a function that gives the 
view of the Tree', i.e. a function of type: 

toView :: Tree' — » TreeView 

This pattern is common in data abstraction [25]: it allows us to 
hide the implementation of the Tree' type, while still being able 
to pattern match on it. It is, for example, also used in efficient 
sequence data structures, such as the one in Data. Sequence: the 
pattern is used to hide the implementation of the sequence such that 
the user cannot differentiate between things which have multiple 
representation, but have the same meaning. 

The Glasgow Haskell Compiler has a syntactic extension called 
view patterns which eases the usage of such data types. More 
precisely, it allows us to apply such a view function inside a pattern 
match. As an example of this, with our previous tree data type we 
could write a function: 

isLeaf Leaf = True 
isLeaf _ = False 

With view patterns, this function on the new Tree' type becomes: 

isLeaf (toView — ► Leaf) = True 
isLeaf _ = False 

In this way, the syntactic inconvenience of our technique is mini- 
mized. 

The implementation of the Tree' data type is an explicit expres- 
sion: a sequence of trees a 0 , ai, ... , a„, such that that the result of 
observing such a Tree' is ao ai <M ... <M a„. 

newtype Tree' = Tree' (CQueue TreeView) 

Where CQueue is an efficient sequence data structure, which we 
assume to be an instance of the type class for sequences defined in 
Figure 3(a). Very efficient purely functional sequence data struc- 
tures exist: data structures where both concatenation and head/tail 
access run in amortized constant time [20], and even data structures 
where both run in worst case constant time [14, 20]. 

The elements of the sequence are of type TreeView, which 
is mutually recursive with Tree': the children of the elements in 
the expression are again explicit expressions. The Tree' type is a 
newtype instead of a type alias, such that we can omit the Tree' 
constructor from the interface, making Tree' an abstract type. 

Constructing a leaf or node of type Tree' is then done by con- 
verting a TreeView value to a Tree' by using the following func- 
tion: 

fromView :: TreeView — > Tree' 
fromView x = Tree' $ singleton x 
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The resulting tree is not (yet) an argument to the substitution oper- 
ator and hence it is represented as a sequence of length one. Notice 
that fromView is the inverse of toView. 

The implementation of the substitution operator <-^> is then sim- 
ply to concatenate the two explicit expressions: 

(<-') :: Tree' — > Tree' — > Tree' 
(Tree' I) «-j (Tree' r) = Tree' (I txj r) 

Since we are using an efficient sequence data structure, this con- 
catenation only takes (amortized) constant time. 

The implementation of <M no longer defines how to actually 
replace the leaves of a tree with another tree. Instead this logic is 
moved to the toView function, which converts an explicit expres- 
sion to its view (i.e. its head normal form). 

toView :: Tree' — > TreeView 
toView (Tree' s) = case viewl s of 
EmptyL — ► Leaf 
h < t — > case h of 

Leaf — » toView (Tree' t) 
Node I r -> Node (I t) (rAt) 
where (-f^) :: Tree' — » CQueue TreeView — » Tree' 
(Tree' I) r = Tree' (I c£a r) 

Where viewl is a function that allows us to view the sequence from 
the left: see if it is empty or obtain the head and tail. In contrast to 
continuation-passing style, converting an explicitly represented ex- 
pression to an observable value does not mean converting the entire 
explicitly represented expression: we partially convert, keeping the 
children of a node as an explicit expressions. 

In this way, all operations we want to support, namely construc- 
tion, observation and substitution have become efficient operations. 
Moreover, the expressions , (x y) z and x (y z) lead 
to the same sequence, and hence performance does not depend on 
the association pattern. It should hence come as no surprise this ap- 
proach also solves performance problems if we alternate between 
building trees using substitution and observing the result of such 
substitutions. 

4.2 Solving the performance problems of generic trees using 
type aligned sequences 

But what if we want to apply our solution to generic trees? We must 
then explicitly represent expressions of the form: 

m >= fi >= f 2 >= f 3 ... >= f„ 
The problem is that each f, has type a — > Tree b, for some a and b, 
and these types can differ between elements. This means we cannot 
use a regular sequence: to use it all elements must be of the same 

type- 
To be able to apply our solution to such situations, we generalize 
sequences to type aligned sequences: sequences parametrized by a 
type constructor c, such that each element is of type cab, for some 
a and b. If the last type argument to c of an element is a, then first 
type argument to c in the next element (if any) must be a. If we 
set the type constructor c to (—>•), we get type aligned sequences 
of functions: the output type of a function is then always the input 
type to the next function. 

In the next section we discuss such type aligned sequences in 
depth and show they can be defined. For now, let us assume that 
we have an efficient type aligned sequence data structure called 
TCQueue, which is an instance of the type aligned sequence type 
class defined in Figure 3(b). 

The elements in the sequence described above are of type 
a — > Tree' b, for some a and b, except the first element m. We 
need a type constructor to describe this pattern: 

type TreeCont a b = a — > Tree' b 



A type aligned sequence where each element is a TreeCont is then 
of the following type 5 : 

type TreeCExp a b = TCQueue TreeCont a b 

The situation is now a bit different than with our non-generic 
trees: an expression involving a series of binds must always start 
with an element of type Tree' a, whereas the rest of the elements 
are of type TreeCont a b, for some a and b. Hence, we implement 
the tree data type as explicit expression containing a first element 
and a sequence of right-hand-side arguments to bind. 

data Tree' a where 
Tree' :: TreeView x — > TreeCExp x a — > Tree' a 

data TreeView a = Leaf a | Node (Tree' a) (Tree' a) 

This definition uses an existential type x: the first element in the 
expression may be a tree of any type, as long as the result of the 
expression is a tree containing elements of type a. 

The fromView and <-> functions are adapted accordingly: 

fromView :: TreeView a — > Tree' a 
fromView x = Tree' x tempty 

(fj) :: Tree' a — > (a — > Tree' b) — » Tree' b 
(Tree' x s) f = Tree' x (s x tsingleton f) 

As before, the actual logic of substitution is moved to the view 
function: 

toView :: Tree' a — > TreeView a 
toView (Tree' b t) = case b of 
Leaf a — > case tviewl t of 
TEmptyL — > Leaf a 
h < t -> toView ((h a) <^ t) 
Node I r -> Node (I ^ t) (r ^ t) 
where (-f^) :: Tree a — > TreeCExp a b — > Tree b 
(Tree' b I) -H= r = Tree' b (I M r) 

In this way, the performance problems for any usage pattern of 
generic trees have also disappeared by using type aligned se- 
quences. 

4.3 The general case 

Suppose we have some recursive data type X and an associative 
operator traversing its left argument but not its right argument. The 
solution is then to replace the data type X by an abstract data type X' 
and rewrite the problematic operator by performing the following 
steps: 

1. Replace X with two mutually recursive data types: one for the 
abstract type containing the explicit expression (X') and one 
view type, which is the same as the original X, but the self- 
references have been replaced by X'. 

2. Define the original operator on X' by concatenating the explicit 
expressions. 

3. Define a fromView function that converts a view value to an 
X' expression by constructing an explicit expression with one 
element. 

4. Define a toView view function that evaluates an explicit expres- 
sion to its view, using the workings of the original operator. 

A type aligned sequence must be used if the type of the right 
argument of the operator depends on the type of the left argument 
of the operator. 

Notice that explicitly representing expressions in this way 
means that applying the operator with the identity element does 

5 To reduce clutter, we ignore that TreeCont must be a newtype for this to 
work in current Haskell. 
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class Sequence s where 



empty :: s a 

singleton :: a — > s a 

(m) :: sa^sa^sa 

view] :: s a — > ViewL s a 

data ViewL s a where 

EmptyL :: ViewL s a 

(<]) :: a — > s a — ► ViewL s a 



(a) A type class for regular sequences. 

Figure 3: Type classes 



class TSequence s where 
tempty :: s c x x 
tsingleton :: cxy^scxy 
(ex]) ::scxy^scyz s c x z 

tviewl :: s c x y — > TViewl sexy 

data TViewl sexy where 
TEmptyL :: TViewl s c x x 

(<]) ::cxy^scyz^ TViewl s c x z 

(b) A type class for type aligned sequences, 
type aligned and regular sequences. 



not necessarily immediately yield the original value. For exam- 
ple, m 3>= return and m are different expressions. However, we 
cannot observe this difference by viewing m 3>= return and m. 
Hence, the identity element is an identity element up to observa- 
tion. Associativity laws directly hold, since sequence concatenation 
is associative. To ensure that we do not accidentally differentiate 
between m ^>= return and m, it is important to define the result 
of the above steps in an separate module and to not export the 
constructor of X'. 

This process gives an abstract type X', with operations to con- 
struct, observe (view) and apply the operator. We argue that this 
resulting data type X' has the same semantics as the original data 
type, provided that X' is abstract. We feel that a formalization of 
these steps and a proof of the isomorphism of X and X' should be 
possible, but it is beyond the scope of this paper. 

5. Type aligned sequences 

In the previous section, we saw that type aligned sequences are 
required to explicitly represent expressions involving operators 
where the type of the left argument depends on the type of the 
right argument. We now introduce type aligned sequences, discuss 
their relation with regular sequences, and show an example of how 
a sequence data type can be converted into a type aligned sequence 
data type. 

5.1 Definition and intuition 

Type aligned sequences are best explained by an example: a type 
aligned sequence of functions is a sequence h,h,h - h such that 
the composition of these functions f i o f 2 o f 3 o . . . o f„ is well typed. 
In other words: the result type of each function in the sequence 
must be the same as the argument type of the next function (if any). 
In general, the elements of a type aligned sequence do not have to 
be functions, i.e. values of type a — > b, but can be values of type 
(c a b), for some binary type constructor c. Hence, we define a type 
aligned sequence to be a sequence of elements of the type (c a, b,) 
with the side-condition b;_i = a,-. If s is the type of a type aligned 
sequence data structure, then (scab) is the type of a type aligned 
sequence where the first element has type (c a x), for some x, and 
the last element has type (c y b), for some y. 

It may be instructive to think of a type aligned sequence as a 
path through a directed graph. In this directed graph each node is 
a type and there is an edge from type a to type b for each value 
of type (cab). Hence, we call a value of type (c a b) a c-edge. 
A type aligned sequence of type (scab) is then a sequence of c- 
edges such that they form a path from a to b trough this graph: the 
target of each edge is the source of the next edge. 

Type aligned sequences can be defined using Generalized Alge- 
braic Data Types (GADTs) [3]. As a simple example of this, con- 
sider a type aligned list: 

data TList c x y where 



Nil :: TList c x x 

( ? ) :: c x y — > TList c y z — > TList c x z 

In the graph interpretation, the empty type aligned sequence corre- 
sponds to an empty path, and hence the empty list is a path from x 
to x, for any x. The Cons constructor adds one c-edge to the front 
of a path, the types ensure that the target of this c-edge is the source 
of the rest the path. 

5.2 Relation with regular sequences 

The only difference between regular sequences and type aligned 
sequences are the types: TList differs from the ordinary list only in 
the more precise types of its constructors. In fact, type aligned se- 
quences are a generalization of regular sequences: any type aligned 
sequence can be used as a regular sequence, but not the other way 
around. We can use a type aligned sequence as a regular sequence 
by effectively "partially erasing" the extra types with the following 
construction: 

data AsUnitLoop a b c where UL :: a — » AsUnitLoop a () () 

By using this construction, there exists an edge from () to () for 
each value of type a in the graph interpretation. Since there are no 
other edges, the graph effectively has just one node: the other types 
are unreachable. Hence, a regular list ai : a2 : 33 ... a„ : [] of type 
[a] corresponds to a type aligned list: 

ULai : ULa 2 : ULa 3 ...ULa„: Nil 

of type TList (AsUnitLoop a) () () . This type aligned list corre- 
sponds to a path of length n through the graph consisting solely of 
self-loops on (), where each edge corresponds to a value of type a. 

We can use this construction to provide an instance for the 
regular sequence class (Figure 3(a)) for any instance of the type 
aligned sequence class (Figure 3(b)): 

type AsSequence s a = s (AsUnitLoop a) () () 

instance TSequence s => Sequence (AsSequence s) where 
empty = tempty 
singleton = tsingleton o UL 
(tSj) = (X) 

viewl s = case tviewl s of 

EmptyL — > TEmptyL 

UL h <] t ->• h < t 

A benefit of using type aligned sequences in this way, instead 
of directly using regular sequences, is that type aligned sequences 
rule out a class of implementation bugs: the types in a type aligned 
sequence enforce the ordering of the elements. Hence, accidentally 
switching two elements will result in a type error, as the resulting 
sequence may not be a path. In contrast, in regular sequences the 
types do not enforce the ordering of the elements and an accidental 
change of order in, for instance, the definition of concatenation 
would have gone unnoticed by the type checker. 
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data Pair c a b where 

(x) :: caw— > c w b — > Pair cab 

data Buffer cab where 

Bl :: c a b -> Buffer cab 

B2 :: Pair cab — > Buffer cab 

data Queue cab where 
QO :: Queue c a a 
Ql :: c a b — > Queue cab 
QN :: Buffer c a x — > Queue (Pair c) x y 
— ► Buffer c y b — > Queue cab 

( \o) :: Queue caw— > c w b — > Queue cab 
q |> b = ... 

viewl :: Queue cab-t TViewl Queue cab 
viewl q = ... 

Figure 4: A type aligned queue data structure. 



In general, sequences, i.e. words over some alphabet, are/ree 
monoids, whereas paths through a directed graph are free cate- 
gories [1]. Sequences in programming languages typically are ho- 
mogeneous: they require that each element has the same type. The 
alphabet is then the set of values of the given type. Similarly, type 
aligned sequences are paths through the directed graph where the 
edges are formed by the values of type (c a b), for all types a and 
b. 

Indeed, any sequence data type can be made an instance of 
Monoid, without assuming anything about the elements of the 
sequence. Similarly, any type aligned sequence data type can be 
made an instance of Category, without assuming anything about 
the elements of the type aligned sequence: 

instance Sequence s => Monoid (s a) where 
mempty = empty 
mappend = (m) 

instance TSequence s => Category (s c) where 
id = tempty 
(o) = flip (eo) 

The fact that we can use any type aligned sequence as a regular 
sequence also has a theoretical motivation: a monoid corresponds 
to a category with just one object, the elements in the monoid 
are now arrows (morphisms) from this one object to itself and the 
monoid operation is arrow composition [1]. Hence, a free monoid 
corresponds to the free category over a graph with just one node, 
where the self-edges correspond to the elements of the alphabet. 
This is exactly what we did with AsUnitLoop above: it makes every 
value of type a into a self-edge on the node (). 

5.3 An example of making sequences type aligned: efficient 
queues 

Generalizing the types of a sequence data type so that it becomes a 
type aligned sequence data type, means generalizing the construc- 
tor types, and assuring (that is, "proving" to the type checker) that 
all operations on the data type preserve the element order. This gen- 
eralization requires some creativity but in our experience, it is a 
straightforward operation. In the code accompanying this paper we 
show type aligned versions of finger trees [9] and of a worst case 
constant time catenable queue [20, 21]. 

As an not entirely trivial example of turning a sequence data 
structure into a type aligned sequence data structure, consider the 
(non-catenable) queue shown in Figure 4. This data structure is es- 
sentially the same as the queue presented in Okasaki's Purely func- 



tional Data Structures [21, §8.4] but the types have been general- 
ized. 

To generalize this queue to a type aligned sequence data struc- 
ture, we needed to generalize not only the types of the constructors 
of the queue, but also the types of the constructors of the pairs and 
buffers of which it consists. Before generalizing the types, both el- 
ements of a pair had the same type, but now the elements are c- 
edges such that they form a path of length two. A buffer can hold 
either a single element or a pair and the types of these construc- 
tors have been generalized straightforwardly. Slightly less obvious 
is generalizing the types of the constructors of a queue. A queue 
may consist of nested queues: if a queue has more than one ele- 
ment (constructor QN), it is represented as two buffers and a queue 
of pairs. With generalized types, the type of this queue of pairs is a 
type aligned queue holding (Pair c)-edges, i.e. paths of length two. 

The only difference in the operations, namely en-queuing and 
viewing the head/tail, is their type signatures, the operations them- 
selves are left unchanged and are hence not shown. The full code 
for these type aligned queues is included in the code accompanying 
this paper. 

6. Fast Monadic Reflection 

In this section we show how our solution can be used in various 
real-life monads. In particular, several monads offer monadic re- 
flection: a way to observe, or reify, the internal state of the com- 
putation, represented in a suitable data structure. For example, the 
internal state of a non-determinism monad can be observed as a 
stream of choices. This terminology is due to Filinski [6] who mod- 
eled it after the terminology of Wand and Friedman [7]. Monadic 
reflection leads to alternating between building and observing, and 
hence leads to previously undocumented, severe performance prob- 
lems. We demonstrate several examples of how we can factor out 
sequences in monads such that monadic reflection can be efficiently 
supported. In particular, we discuss LogicT transformers, iteratees 
(and related constructs), free monads and extensible effects. 

6.1 LogicT Monad Transformers 

As a first example of how we can apply our solution to a practi- 
cal example, consider non-determinism monads. The MonadPlus 
type class extends the Monad interface with support for non- 
deterministic choice with backtracking. The most obvious instance 
of this interface is the list monad: bind is then concatMap (with 
the order of the arguments reversed) and mplus is concatenation. 
The usage of list concatenation can lead to performance problems, 
which can be solved by simply using a catenable queue instead. 

Kiselyov, Shan, Friedman and Sabry [16] showed that a large 
class of logical effects, namely cut, soft cut, interleaving and fair 
conjunction, can all be expressed when a single function is added 
to the interface. This function, called msplit, essentially splits the 
logical computation into a computation of the first result and com- 
putation of the rest of the results. More precisely, this function has 
type: 

class MonadPlus m MonadLogic m where 
msplit :: m a — > m (Maybe (a, m a)) 

It takes a logical computation and turns it into another logical 
computation, namely one which returns Nothing if the original 
logical computation had no results, and otherwise returns a Just 
value carrying a tuple of the first result and the logical computation 
of the rest of the results. This is an instance of monadic reflection: 
msplit allows us to observe the internal state of the monad as 
a stream of results. The implementation of this msplit function 
for lists and other sequence data structures is straightforward: it 
converts the empty sequence to Nothing and a non-empty sequence 
to a Just value of the head and tail. 
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newtype ML m a = ML { toView :: m (Maybe (a, ML m a)) } 
fromView = ML 

single a = return (Just (a, mzero)) 

instance Monad m => Monad (ML m) where 
return = fromView o single 

(toView — > m) 2>= f = fromView $ m >■= Ax — > case x of 

Nothing — > return Nothing 

Just (h,t) -> toView (f h v mplus v (t >= f)) 
fail _ = mzero 

instance Monad m => MonadPlus (ML m) where 
mzero = fromView (return Nothing) 

mplus (toView — ¥ a) b = fromView $ a 2>= Ax — > case x of 
Nothing — > toView b 

Just (h,t) -> return (Just (h,t ^mplus* b)) 

instance MonadTrans ML where 

lift m = fromView (m 2>= single) 
instance Monad m => MonadLogic (ML m) where 

msplit (toView — > m) = lift m 

(a) Original implementation. 

newtype ML m a = ML ( CQueue (m (Maybe (a, ML m a)))) 
fromView = ML o singleton 

instance Monad m => MonadPlus (ML m) where 
mzero = ML empty 
mplus (ML a) (ML b) = ML (a M b) 

toView :: Monad m => ML m a — > m (Maybe (a, ML m a)) 
toView (ML s) = case viewl s of 

EmptyL — > return Nothing 

h < t — > h 2>= Ax — > case x of 
Nothing -> toView (ML t) 

Just ( hi , ML ti) -» return (Just (hi , ML $ ti M t)) 
the other code is unchanged 



(b) Changes to the original implementation. 
Figure 5: A stream implementation of MonadLogic 



However, an efficient monad transformer that adds non-de- 
terminism to an arbitrary monad is not defined so easily. In a 
functional pearl [8], Hinze systematically derives such a non- 
determinism monad transformer implementation. He then notes 
that a left-associated mplus expression has quadratic performance, 
and solves this by using continuation-passing style. Note that 
there is no problem with bind for a non-determinism monad: like 
concatMap for lists, it traverses both the left argument and (the 
result of) the right argument. Kiselyov et al. show how the monad 
transformer implementation of Hinze can be adapted such that it is 
also an instance of MonadLogic. Although it can be really tricky 
to see this directly from the code, this instance of MonadLogic 
has severe performance problems. Effectively, their implementa- 
tion of msplit corresponds to converting a difference list to a list 
and converting to tail of the list to a difference list again. Hence, 
each invocation of msplit will add one extra operation per result in 
the remainder of the logical computation. 

Their implementation uses continuation-passing style with two 
continuations, but the point of this paper is that it is better to 
make the sequence explicit instead of representing it as a tree 
of functions (i.e. CPS). Hence, we do not apply our method to 
this implementation, but to a standard stream implementation of 
backtracking [26] as shown in Figure 5(a). In this implementation, 



the ML type is essentially a list where each node of the list is 
the result of a computation in the underlying monad. The list can 
be empty (Nothing) or a head and tail (Just (a, ML m a)). The 
definitions are then analogous to the definitions for the lists: mplus 
is concatenation and 3>= is like concatMap. 

Notice that ML is not the same as the ListT construction: 

newtype ListT m a = ListT { runListT :: m [a] } 
instance Monad m => Monad (ListT m) where ... 

This construction only yields a monad if the argument monad, m, 
is commutative [12]. The difference is that in ML each node in 
the "list" is the result of a computation in the underlying monad, 
whereas with the ListT construction the entire list is the result of a 
single computation in the underlying monad. 

An example of the asymptotic performance problem is the fol- 
lowing function which obtains at most n solutions of a logical com- 
putation. 

seqN :: MonadLogic m => Int — > m a — > m [a] 
seqN n m 

| n = 0 = return [] 

| otherwise = msplit m 2>= Ax — > case x of 
Nothing — > return [ 
Just (a,m) -> liftM (a:) (seqN (n-1) m) 

Figure 6(a) 6 shows, for different implementations, the running time 
of obtaining n natural numbers using seqN, where the natural 
numbers are defined as follows 7 : 

nats = natsFrom 1 where 

natsFrom n = return n v mplus v natsFrom (n + 1) 

Obtaining a number of solutions requires us to recursively split 
the logical computation, and hence the two continuation implemen- 
tation as implemented in Hackage package LogicT has quadratic 
running time. Of course, this is just a micro-benchmark constructed 
to illustrate the problem. However, this problem does not only oc- 
cur on the natural numbers: it occurs any time we request only 
some, instead of all, solutions to a logical computation. This is 
highly counter-intuitive: it is much faster to obtain all results than 
some results. Moreover, since we are talking about monad trans- 
formers, requesting all results is not always an option: it may in- 
voke undesired and/or irrevocable effects in the underlying monad. 

The same problem occurs with the interleave operator as de- 
scribed by Kiselyov et al., which ensures fair consideration be- 
tween two branches of a logical computation. An example usage 
of this operator is the following the logical computation: 

unfair = do x <— nats v mplus v return 0 

if x = 0 then return x else mzero 

The behavior of mplus in these implementations is that it first con- 
siders all solutions from its left argument, and only afterwards con- 
siders the solutions of its right argument. Since nats has an infinite 
number of results, this computation will never yield a solution. If 
interleave is used instead of mplus, then solutions from nats and 
return 0 are considered alternately and the computation will yield 
a solution. This interleave operator is defined in terms of mplus 
and msplit as follows: 

interleave :: ma— > m a — > m a 
interleave I r = msplit I 3>= Ax — > case x of 
Nothing — > r 

Just (h,t) — > return h x mplus v interleave r t 



6 These measurements are the median of 5 runs and were performed on an 
AMD Phenom II X4 9()5e Processor CPU running Linux 3.2.0 on binaries 
produced with the GHC 7.6.3 (optimization level 2). The fixed stream 
implementation uses a worst case constant time catenable queue. 

7 (a v f v b) is an alternative notation for (f a b). 
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Figure 6: Running time of msplit and mplus micro benchmarks for LogicT. 
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(b) Running time of observing all results in a left-associated 
mplus expression with n elements. 



Since interleave recursively splits the remaining computation of 
both arguments, any usage of it while using a two continuation 
implementation of backtracking will lead to performance problems. 
For instance, the following logical computation: 

test = choose [l...n] v interleave v choose [n... 1] 

where choose I = foldr mplus mzero (map return I) 

also runs in 0(n 2 ). The same problem occurs when using using the 
fair conjunction operator, which is defined in terms of interleave. 
The cut and soft cut operators are also problematic, but much less 
severely: they only split the logical computation once. 

Obtaining only a limited number of solutions and using the 
interleaving or fair conjunction operators is not problematic when 
using the ML implementation of MonadLogic: we can observe 
results directly by running a computation in the underlying monad: 
there is no conversion involved. Instead, the problem is now mplus: 
it traverses the left hand argument but not the right hand argument. 
Figure 6(b) shows the running time of obtaining all solutions of a 
left-associated mplus expression: 

test :: MonadPlus m => Int — > m Int 

test n = foldl mplus mzero (map return [1... n]) 

Now the running time of the ML implementation is quadratic. The 
dual continuation implementation does not suffer the same prob- 
lem, as it was originally derived by Hinze to solve this problem. 
Hence, that the performance characteristics of the M L implementa- 
tion are opposite to those the two continuation implementation: the 
ML implementation has quadratic performance on a left-associated 
mplus expression, but no performance problem with msplit. 

Applying our solution to the ML implementation yields the 
changes that are shown in Figure 5(b). The changes are very similar 
to the changes to the (non-generic) Tree data type: we change the 
ML data type to an explicit expression involving mplus, and the 
actual logic of non-deterministic choice is moved to the toView 
function. As can be seen from the graphs, after applying our method 
the problem with mplus disappears: the running time is now linear. 
Moreover, this stream implementation with our method applied to 
it is the only implementation which efficiently supports both msplit 
and mplus. 

6.2 Iteratees and related monads 

As a second example of how we can apply our solution to a prac- 
tical example, consider iteratees [15]: a style of incremental input 
processing that overcomes the problems of lazy I/O and handle- 
based I/O. We consider a simplified version of iteratees where an 



data It i a = Get (i — > It i a) | Done a 

instance Monad (It i) where 
return = Done 
(Ret x) >= g = g x 
(Get f) >= g = Get (f ;g> g) 

get :: It i i 
get = Get return 

Figure 7: Iteratees before applying our solution. 

iteratee is a monadic computation that can request an input element, 
as shown in Figure 7. 

An iteratee is in one of two possible states: the constructors of 
the It data type. If an iteratee is Done it simply carries the value it 
produces. If an iteratee needs an input element, it is a Get value, 
carrying a function that when given the input element returns the 
next iteratee state. A Monad instance for such iteratees is then 
defined straightforwardly. In this definition, the (;§>) operator is 
Kleisli composition (f ;§> g = Ax — > f x ^>= g) as introduced in 
section 2.3. 

Although it can be easy to miss, the definition of the monadic 
bind, like its definition in the original paper, exhibits the problem- 
atic pattern: it traverses its left argument but not its right argument. 
It does not matter that (>•=) invokes itself by using function com- 
position instead of application, this just obfuscates the problem. 

As example of the performance problem is the following iteratee 
computation, that gets n elements from the input and then returns 
their sum: 

sumlnput :: Int — > It Int Int 

sumlnput n = Get ( foldl (;§>) return ( replicate (n — 1) f)) 
where f x = get 2>= return o (+ x) 

Where replicate n e is a function that creates a list of the length 
n, where each element is e. The sumlnput function yields an 
expression of the form: 

Get ((((return ;§> f) ;§> f) ;§> f) ... 2i> f) 

Figure 8 shows that when the argument to Get is called with a new 
input element x, it costs O(n) steps to obtain the next iteratee state: 

Get ((((( return o (+ x)) 2S> f) 2§> f) 3§> f) ... ;§> f) 

This very similar to the original expression, exhibiting the same 
problem. Hence, the running time of feeding this iteratee computa- 
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tion n elements and obtaining their sum is quadratic. The sumlnput 
function can easily be made to run in linear time by simply switch- 
ing from foldl to foldr. However, in general solving such perfor- 
mance problems by avoiding the problematic pattern is not as sim- 
ple: we must then make sure that that each left argument to bind 
cannot be the result of a bind. 

We can solve the problem with repeated binds by using the co- 
density monad transformer, as defined in Section 3.2, as proposed 
by Voigflander [24]. When using this method, we only use coden- 
sity transformed iteratees to build monadic expressions: 

type ItCo i a = CodensityT (It i ) a 

We then redefine get so that it gives a codensity transformed itera- 
tee: 

getCo :: ItCo i i 
getCo = rep get 

A monadic expression built in this way will then always result in 
a right-associated expression when converted to a regular iteratee 
computation, thus avoiding the problem of repeated binds. 

We now find ourselves in a familiar situation: this method makes 
alternating between building and observing problematic. An exam- 
ple of this is the following, often useful, parallel iteratee compo- 
sition function, defined as a regular (non-codensity transformed) 
iteratee function: 

par :: It i a -¥ It i b -»• It i ( It i a, It i b) 
par I r 

| Done _ «— I = Done (I, r) 
| Done _ «— r = Done (l,r) 

I Get f «- I , Get g «- r = get >= Ax — > par (f x) (g x) 

This operator runs both iteratees in parallel, feeding each input 
element to both, until at one of the iteratees is done. Afterwards, 
the remaining iteratee computation of both arguments is returned, 
which can then be composed again with other iteratees using par 
and »=. The par function is an instance of monadic reflection: we 
observe the internal state of both iteratees. 

If we want to use par on codensity transformed iteratees, we 
need to redefine it as follows: 

parCo :: ItCo i a — > ItCo i b 

-> ItCo i (ItCo i a, ItCo i b) 
parCo I r = rep (par (abs I) (abs r)) 2>= 

(A(l,r) — > return (rep I, rep r)) 

We need to eliminate the codensity transformer using abs to ob- 
serve the states of both iteratees. After applying the original par 
function, we want to be able to compose the resulting iteratees 
again with 2>= and parCo. However, they are no longer coden- 
sity transformed iteratees, while other iteratees are in this form to 



avoid the problems with bind. We need to convert the rest of the re- 
sulting iteratees back to codensity transformed form. Hence, each 
invocation of parCo adds an extra operator per Get in the remain- 
ing iteratee, which can easily lead to performance problems when 
iteratees are long lived and used in many invocations of parCo. 

A related construction is monadic coroutines, which are like 
iteratees except that they also output an element each time they 
request an input element. Blazevic [2] presents an extensive library 
for such coroutines, but his coroutine definition suffers from the 
same problem as the original iteratee definition. 

Another guise of the same situation occurs in monadic FRP [23]: 
a framework which essentially applies coroutines in a functional re- 
active programming (FRP) setting. In monadic FRP, a combinator 
very similar to par is at the heart of composing reactive computa- 
tions and the bind in the paper has the same problem as the original 
iteratees. In fact, the motivation for this work is that we noticed 
that our monadic FRP program became progressively slower, due 
to repeated application of bind on the results of par, and eventually 
came to a grinding halt. Since par is used often in monadic FRP, 
and coroutines can live for a long time, being used in many invo- 
cations of par, the use of the codensity monad would also lead to 
a severe slowdown. With our solution applied, monadic FRP pro- 
grams no longer become progressively slower, running efficiently 
no matter what the usage pattern. 

Our solution can be applied to iteratees, coroutines and monadic 
FRP. By using an efficient type aligned sequence data structure, the 
performance of improves dramatically, without constraining our- 
selves by disallowing functions involving monadic reflection like 
par. We do not show the code for this due to space considerations, 
but instead note that iteratees, coroutines and monadic FRP are all 
instances of a construction known as a free monad, which we dis- 
cuss and show the improved code of in the next section. 

6.3 Free Monads 

Swierstra [22] shows how a monad instance can be defined for any 
functor, resulting in a monad that is called the free monad [1] on 
that functor. This construction is defined as follows: 

data FreeMonad fa = Pure a 

| Impure (f (FreeMonad fa)) 

instance Functor f => Monad (FreeMonad f) where 
return = Pure 
(Pure x) >= f = f x 

(Impure t) 2>= f = Impure (fmap (2>= f) t) 

Swierstra then notes that several well known monads are free mon- 
ads. For example, the Maybe monad is the free monad on the fol- 
lowing functor: 
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data One a = One deriving Functor 

Now (Pure a) corresponds to (Just a) and (Impure One) corre- 
sponds to Nothing. 

However, for many functors this construction leads to asymp- 
totic problems. Consider for example the following Functor: 

newtype Get i a = Get (i — > a) deriving Functor 

A free monad on this functor corresponds to the iteratees we saw in 
the previous section. Free monads over the following functors: 

data Node a = Node a a deriving Functor 

data Yield out inn a = Yield out (inn — » a) deriving Functor 

correspond to the generic trees with substitution and coroutines, 
respectively. It should come as no surprise that the performance 
problem of iteratees, generic trees and coroutines did not go 
away by formulating them as free monads. Again, we could use 
continuation-passing style, but this would make functions like par 
expensive. 

We solve these problem for all free monads by simply applying 
our solution. The definition of free monads then becomes: 

type FC f a b = a — > FreeMonad f b 
type FMExp f a b = TCQueue (FC f) a b 

data FreeMonad f a where 

FM :: FreeMonadView f x — > FMExp fxa-> FreeMonad f a 
data FreeMonadView fa = Pure a 

| Impure (f (FreeMonad f a)) 
fromView x = FM x tempty 

toView :: Functor f => FreeMonad f a — > FreeMonadView f a 
toView (FM h t) = case h of 
Pure x — > 
case tviewl t of 

TEmptyL — > Pure x 
he < tc — > toView (he x 3>= tc) 
Impure f — > Impure (fmap (3*= t) f) where 
(2>=) :: FreeMonad f a — > FMExp f a b — > FreeMonad f b 
(FM h t) >= r = FM h (t M r) 

instance Monad (FreeMonad f) where 
return = fromView o Pure 
(FM m r) >= f = FM m (r M singleton f) 

Notice that this code is very similar to the code we got from 
applying our solution to our generic tree example in Section 4.2. 
This should come as no surprise: generic trees are free monads. 

As usual, the code for these adapted free monads is included 
in the code accompanying this paper, as well as a benchmark 
demonstrating the performance problem and that our method solves 
it. 

6.4 Extensible effects 

Recently Kiselyov, Sabry, Swords and Foppa introduced extensible 
effects [17]: a framework for composing and implementing compu- 
tational effects that overcomes the problems of monad transformers 
in terms of efficiency, expressiveness and ease of notation. In this 
framework an effect is an interaction between a client and a han- 
dler: the client sends a value describing the desired effect to the 
handler, which in turn executes the desired effect and passes the 
result to the client. 

The approach of Kiselyov et al. uses functors to describe both 
which effect to request and how to continue afterwards. For exam- 
ple, both the request to modify a state and how to proceed after- 
wards, are represented by the following functor: 

data ModifyState s w = 

ModState (s — > s) (s — » w) deriving Functor 



The first argument tells the handler how to modify the state, 
whereas the second argument tells the handler how to continue 
afterwards, it takes the new state and then produces some w. The 
free monad over this functor is then the value that is interpreted by 
the handler: if the value is Impure (ModState f c), it applies the 
function f to the state and calls the function c with the new state. 
This may again yield an Impure value and the process continues 
until the handler sees a Pure value. 

The extensible in extensible effects comes from the fact that 
handlers do not interpret a free monad over a single functor, but 
a free monad over an open union of functors. An open union is a 
value that can be of any type in a set of types. This distinguishes 
it from a closed union, for example Either a b, which has a list 
of types. Kiselyov et al. then show an implementation of an open 
unions of functors, which in itself is again a functor. In this way 
handlers for different effects can be stacked: if a handler does not 
handle the desired effect, the value describing the effect is passed 
to the next handler in the stack. 

However, as we saw in the previous section, many functors 
give rise to performance problems when using a (non-adapted) 
free monad. For functors describing effects, this is the case if the 
effect produces some result which is then passed to a continuation 
function. This is always the case, except for exceptions. 

Kiselyov et al. avoid this problem by using a variant of free 
monads using continuation-passing style. This has the advantage 
that it avoids the performance problems of wrong groupings of 
expressions involving bind, but it has the disadvantage that handlers 
must be written in continuation-passing style. In a related paper, 
Kammar et al. [13] avoid the performance problem by (implicitly) 
applying the codensity monad. 

Both approaches lead to performance problems when effects re- 
quiring reflection such as iteratees, LogicT transformers or delim- 
ited continuations are modeled. With our solution, extensible ef- 
fects can directly be expressed as (adapted) free monads over open 
unions, without the need for manual continuation-passing style or 
the codensity monad. Moreover, effects that require reflection can 
then be efficiently supported. An example implementation of ex- 
tensible effects as efficient free monads is included in the code ac- 
companying this paper, as well as a benchmark involving reflection 
in the form of a logical cut effect, that is quadratic in the original 
implementation, but linear in our adapted implementation. 

7. Conclusion 

Associative operators that traverse their left argument, but not their 
right argument, can lead to asymptotic overhead. A popular cure is 
to use continuation-passing style, but this cure is only effective if 
our usage is strictly separated into a build and an observation phase, 
otherwise the cure is as bad as the disease. 

We presented a solution that solves such performance problems 
for any usage pattern, even when alternating between building 
and observing. Our solution reveals a hidden sequence, namely 
repeated applications of such a problematic operator, and makes 
it concrete using an efficient sequence data structure. 

To support operators where the type of the right argument de- 
pends on the type of the left argument, such as the monadic bind, 
we introduced a generalization of sequences called type aligned 
sequences. Type aligned sequences enforce the ordering of their el- 
ements, and hence rule out ordering bugs. 

Monadic reflection, i.e. a way to observe, or reify, the internal 
state of a monadic computation requires us to alternate between 
building and observing. We showed that reflection does not have 
to lead to remorse: our solution efficiently supports reflection. We 
have demonstrated that our solution can yield an asymptotic run- 
ning time improvement in iteratees (and related constructs), LogicT 
transformers, free monads and extensible effects. 
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Our solution is not limited to the examples we discussed in this 
paper. In the accompanying code, we show how sequences can be 
factored out in delimited continuations [5] and term monads [19]. 
Given the simplicity of the problematic pattern and the widespread 
usage of continuation-passing style, we suspect that there are many 
more applications of our solution hiding in corners where we have 
not looked yet. 
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